No extension of quantum theory can have improved predictive power* 
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According to quantum theory, measurements generate random outcomes, in stark contrast with 
classical mechanics. This raises the question of whether there could exist an extension of the theory 
which removes this indeterminism, as suspected by Einstein, Podolsky and Rosen (EPR). Although 
this has been shown to be impossible, existing results do not imply that the current theory is 
maximally informative. Here we ask the more general question of whether any improved predictions 
can be achieved by any extension of quantum theory. Under the assumption that measurements 
can be chosen freely, we answer this question in the negative: no extension of quantum theory can 
give more information about the outcomes of future measurements than quantum theory itself. Our 
result has significance for the foundations of quantum mechanics, as well as applications to tasks 
that exploit the inherent randomness in quantum theory, such as quantum cryptography. 



Given a system and a set of initial conditions, clas- 
sical mechanics allows us to calculate the future evolu- 
tion to arbitrary precision. Any uncertainty we might 
have at a given time is caused by a lack of knowledge 
about the configuration. In quantum theory, on the other 
hand, certain properties — for example position and mo- 
mentum — cannot both be known precisely. Furthermore, 
if a quantity without a defined value is measured, quan- 
tum theory prescribes only the probabilities with which 
the various outcomes occur, and is silent about the out- 
comes themselves. 

This raises the important question of whether the out- 
comes could be better predicted within a theory beyond 
quantum mechanics lj. An intuitive step towards its 
answer is to consider appending local hidden variables 
to the theory [2j. These are classical variables that al- 
low us to determine the experimental outcomes (see later 
for a precise definition). Here we ask a new, more gen- 
eral question: is there any extension of quantum theory 
(not necessarily taking the form of hidden variables) that 
would convey any additional information about the out- 
comes of future measurements? 

We proceed by giving an illustrative example. Con- 
sider a particle heading towards a measurement device 
which has a number of possible settings, denoted by a pa- 
rameter, A, corresponding to the different measurements 
that can be chosen by the experimenter. The measure- 
ment generates a result, denoted X. For concreteness, 
one could imagine a spin- 1 particle incident on a Stern- 
Gerlach apparatus. Each choice of measurement corre- 
sponds to a particular orientation of the device and the 
outcome is assigned depending on which way the beam 
is deflected. Within quantum theory, a description of 
the quantum state of the particle and of the measure- 
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FIG. 1: Illustration of the scenario. A measurement is car- 
ried out on a particle, depicted as a photon measured using an 
arrangement comprising a polarizing beam splitter and two de- 
tectors. The measurement choice (the angle of the polarizing 
beam splitter) is denoted A and the outcome, X, is assigned 
— 1 or 1 depending on which detector fires. On the right, we 
represent the additional information that may be provided by an 
extended theory, S, shown here taking the form of either a hid- 
den variables, i.e., a classical list assigning outcomes, or b a more 
general (e.g. quantum) system. 



ment apparatus allows us to calculate the distribution, 
Px\Ai °f the outcome, X, for each measurement choice, 
A. Another example is described in Figure [Tl 

In this work we consider the possibility that there ex- 
ists additional, yet to be discovered, information that 
allows the outcome X to be better predicted. We do not 
place any restrictions on how this information is mani- 
fest, nor do we demand that it allows the outcomes to 
be calculated precisely. In particular, it could be that 
the additional information gives rise to a more accurate 
distribution over the outcomes. For example, in an ex- 
periment for which quantum theory predicts a uniform 
distribution over the outcomes, X, there could be addi- 
tional information that allows us to calculate a value, X' ', 



such that X = X' with probability | (in the model pro- 
posed by Leggett [3], for instance, the local hidden vari- 
ables provide information of this type) . More generally, 
we allow for the possibility of an extended theory that 
provides non-classical information. For example, it could 
comprise a "hidden quantum system" , which, if measured 
in the correct way, gives a value correlated to X. 



I. RESULTS 
Assumptions 

In order to formulate our main claim about the non- 
extendibility of quantum theory, we introduce a frame- 
work within which any arbitrary additional information 
provided by an extension of the current theory can be 
considered. In the following, we explain this framework 
on an informal level (see the Supplementary Information 
for a formal treatment). 

The crucial feature of our approach is that it is opera- 
tional, in the sense that we only refer to directly observ- 
able objects (such as the outcome of an experiment), but 
do not assume anything about the underlying structure 
of the theory. Note that the outcome, X, of a measure- 
ment is usually observed at a certain point in spacetime. 
The coordinates of this point (with respect to a fixed 
reference system) can be determined operationally using 
clocks and measuring rods. Analogously, the measure- 
ment setting A needs to be available at a certain space- 
time point (before the start of the experiment). To model 
this, we introduce the notion of a spacetime random vari- 
able (SV), which is simply a random variable together 
with spacetime coordinates (t, rx,r%,r3). Operationally, 
a SV can be interpreted as a value that is accessible at 
a given spacetime point {b,T\,ri,T$). We now model a 
measurement process as one that takes an input, A, to 
an output, X, where both X and A are SVs. 

Our result is based on the assumption that measure- 
ment settings can be chosen freely (which we call As- 
sumption FR). We note that this assumption is common 
in physics, but often only made implicitly. It is, for exam- 
ple, a crucial ingredient in Bell's theorem (see [1]). For- 
mulated in our framework, Assumption FR is that the 
input, A, of a measurement process can be chosen such 
that it is uncorrelated with certain other SVs, namely all 
those whose coordinates lie outside the future lightcone 
of the coordinates of A. We note that this reference to a 
lightcone is only used to identify a set of SVs, and does 
not involve any assumptions about relativity theory (see 
the Supplementary Information). However, the motiva- 
tion for Assumption FR is that, when interpreted within 
the usual relativistic spacetime structure, it is equivalent 
to demanding that A can be chosen such that it is un- 
correlated with any pre-existing values in any reference 
frame. That said, the lack of correlation between the rel- 
evant SVs could be justified in other ways, for example 
by using a notion of "effective freedom" (discussed in [3]). 



We also remark that Assumption FR is consistent with 
a notion of relativistic causality in which an event B can- 
not be the cause of A if there exists a reference frame in 
which A occurs before B. In fact, our criterion for A 
to be a free choice is satisfied whenever anything cor- 
related to A could potentially have been caused by A. 
However, in an alternative world with a universal (frame- 
independent) time, one might reject Assumption FR and 
replace it with something weaker, for example that A is 
free if it is uncorrelated with anything in the past with 
respect to this universal time. Nevertheless, since exper- 
imental observations indicate the existence of relativistic 
spacetime, we use a notion of free choice consistent with 
this. 

We additionally assume that the present quantum the- 
ory is correct (we call this Assumption QM). This as- 
sumption is natural since we are asking whether quan- 
tum theory can be extended. In fact, we only require 
that two specific aspects of quantum theory hold, and so 
split Assumption QM into two parts. On an informal 
level, the first is that measurement outcomes obey quan- 
tum statistics, and the second is that all processes within 
quantum theory can be considered as unitary evolutions 
if one takes into account the environment (see the Sup- 
plementary Information for more details). We remark 
that the second part of this assumption need only hold 
for microscopic processes on short timescales and does 
not preclude subsequent wave function collapse. 



Main Findings 

Consider a measurement which depends on a setting A 
and produces an output X. According to quantum the- 
ory, we can associate a quantum state and measurement 
operators with this process from which we can compute 
the distribution Px\a- 

We ask whether there could exist an extension of quan- 
tum theory that provides us with additional information 
(which we denote by S) that is useful to predict the out- 
come. In order to keep the description of the information, 
S, as general as possible, we do not assume that it is en- 
coded in a classical system, but instead characterize it 
by how it behaves when observed. (Formally, we model 
access to S analogously to the measurement of a quan- 
tum system, i.e., as a process which takes an input SV 
and produces an output SV.) We demand that S can be 
accessed at any time (similarly to classical or quantum 
information held in a storage device) and that it is static, 
i.e., its behaviour does not depend on where or when it 
is observed. 

Our main result is that we answer the above question 
in the negative, i.e., we show that, using Assumptions FR 
and QM, the distribution Px\a is the most accurate de- 
scription of the outcomes. More precisely, for any fixed 
(pure) state of the system, the chosen measurement set- 
ting, A, is the only non-trivial information about X, and 
any additional information, S, provided by an extended 



theory is irrelevant. We express this via the Markov chain 
condition 



IhA«H. 



(1) 



This condition expresses mathematically that the distri- 
bution of X given A and 2 is the same as the distribu- 
tion of X given only A 5 . Hence, access to H does not 
decrease our uncertainty about X, and there is no bet- 
ter way to predict measurement outcomes than by using 
quantum theory. 

In the Methods, we sketch the proof of this (the full 
proof is deferred to the Supplementary Information). 



II. DISCUSSION 

We now discuss experimental aspects related to our re- 
sult. Note that at the formal level, we present a theorem 
about certain defined concepts based on certain assump- 
tions, hence what remains is to connect our definitions to 
observations in the real world, and experimentally con- 
firm the assumptions, where possible. Assumption FR 
refers to the ability to make free choices and — while we 
can never rule out that the universe is deterministic and 
that free will is an illusion — this is in principle falsifiable, 
e.g. by a device capable of guessing an experimentalist's 
choices before they are made. (See also [6] where the 
possibility of weakening this assumption is discussed.) 

The validity of Assumption QM could be argued for 
based on experimental tests of quantum theory. How- 
ever, the existence of the particular correlations we use 
in the second part of our proof is quantum-theory inde- 
pendent, so worth establishing separately. Due to experi- 
mental inefficiencies, these correlations cannot be verified 
to arbitrary precision. Figure [2] bounds our ability to ex- 
perimentally establish ([!]) depending on the quality of 
the setup used (characterized here by the visibility). For 
more details, see the Methods. 

We proceed by discussing previous work on extensions 
of quantum theory. To the best of our knowledge, all 
such extensions that have been excluded to date can also 
be excluded using our result. 

The question asked by EPR [1] was whether quantum 
mechanics could be considered complete. They appealed 
to intuition to argue that an extended theory should exist 
and one might then have hoped for a deterministic com- 
pletion, i.e. one that would uniquely determine the mea- 
surement outcomes — contrast this with our (more gen- 
eral) notion, where the extended theory may only give 
partial information. Bell [2] famously showed that a de- 
terministic completion is not possible when the theory is 
supplemented by local hidden variables. (To relate this 
back to our result, this corresponds to the special case 
where the additional information, 5, is a classical value 
specified by the local hidden variables. A short discus- 
sion on the term local can be found in the Supplementary 
Information.) Recently, a conclusion [7J similar to Bell's 



has been reached using the Kochen-Specker theorem [8]. 
These results have been extended to arbitrary (i.e. not 
necessarily local) hidden variables [§1 [TO] under the as- 
sumption of relativistic covariance (see also [TT] , as well 
as [H] where a condition slightly weaker than locality is 
used to derive a theorem similar to Bell's). 

The aforementioned papers left open the question of 
whether there could exist an extended theory which pro- 
vides additional information about the outcomes without 
determining them completely. (Note that, in his later 
works, Bell uses definitions that potentially allow proba- 
bilistic models [13 . However, as explained in the Supple- 
mentary Information, non-deterministic models are not 
compatible with Bell's other assumptions.) In the case 
that the additional information takes the form of local 
hidden variables, an answer to the above question can be 
found in [3l HH [15] , and the strongest result is that any 
local hidden variables are necessarily uncorrelated with 
the outcomes of measurements on Bell states [TOJ. (We 
remark that the model in [3] also included non-local hid- 
den variables. However, we have not referred to these 
in this paragraph, since, as mentioned below in the con- 
text of de Broglic-Bohm theory, the presence of non-local 
hidden variables contradicts Assumption FR.) 

In the present work, we have taken this idea further 
and excluded the possibility that any extension of quan- 
tum theory (not necessarily in the form of local hidden 
variables) can help predict the outcomes of any mea- 
surement on any quantum state. In this sense, we show 
the following: under the assumption that measurement 
settings can be chosen freely, quantum theory really is 
complete. 

We remark that several other attempts to extend quan- 
tum theory have been presented in the literature, the de 
Broglic-Bohm theory [TOJ [T7] being a prominent example 
(this model recreates the quantum correlations in a de- 
terministic way but uses non-local hidden variables, see 
e.g. [18] for a summary). Our result implies that such 
theories necessarily come at the expense of violating As- 
sumption FR. 

Another way to generate candidate extended theories 
is via models which simulate quantum correlations. We 
discuss the implications of our result in light of such mod- 
els in the Supplementary Information. In addition, we 
remark that a claim in the same spirit as ours has re- 
cently been obtained based on the assumption of non- 
contextuality [TO] . 

Randomness is central to quantum theory and with 
it comes a range of philosophical implications. In this 
Article we have shown that the randomness is inherent: 
any attempt to better explain the outcomes of quantum 
measurements is destined to fail. Not only is the universe 
not deterministic, but quantum theory provides the ul- 
timate bound on how unpredictable it is. Aside from 
these fundamental implications, there are also practical 
ones. In quantum cryptography, for example, the unpre- 
dictability of measurement outcomes can be quantified 



and used to restrict the knowledge of an adversary. Most 
security proofs implicitly assume that quantum theory 
cannot be extended (although there are exceptions, the 
first of which was given in [20]) . However, in this work, 
we show that this follows if the theory is correct. 



III. METHODS 

Our main result is the following theorem whose proof 
we sketch here (see the Supplementary Information for 
the formal treatment). 

Theorem 1. — For any quantum measurement with input 
SV A and output SV X and for any additional informa- 
tion, S, under Assumptions QM and FR, the Markov 
chain condition (fiT) holds. 

The proof is divided into three parts. The first two are 
related to a Bell-type setting, involving measurements on 
a maximally entangled state. In Part I, we show that As- 
sumption FR necessarily enforces that S is non-signalling 
(in the sense defined below). In Part II we show that 
for a particular set of bipartite correlations, if S is non- 
signalling, it cannot be of use to predict the outcomes. 
These correlations occur in quantum theory (cf. the first 
part of Assumption QM) when measuring a maximally 
entangled state and hence we conclude that no S can 
help predict the outcomes of measurements on one half 
of such a state. Finally, in Part III, we use the second 
part of Assumption QM to argue that this conclusion 
also applies to all measurements on an arbitrary (pure) 
quantum state. Together, these establish our claim. 

The bipartite scenario used for the first two parts of the 
proof involves two quantum measurements, with inputs A 
and B and respective outputs X and Y. The setup is such 
that the two measurements are spacelike separated in the 
sense that the coordinates of A are spacelike separated 
with the coordinates of Y, and, likewise, those of B are 
spacelike separated with those of X. 

As mentioned in the main text, we model the infor- 
mation provided by the extended theory, E, by its be- 
haviour under observation. We introduce a SV, C, which 
can be thought of as the choice of what to observe, and 
another SV, Z , which represents the outcome of this ob- 
servation. In terms of these variables, our main result, 
Equation (fl]), can be restated that for all values of a, c 
and x, we have 



P P 

r Z\acx r Z\ac 



(2) 



(Note that we use lower case to denote specific values of 
the corresponding upper case SVs.) 

Proof: Part I. — The entire setup described above (in- 
cluding the additional information 5, accessed by choos- 
ing an observable, C, and obtaining an outcome, Z) gives 
rise to a joint distribution Pxyziabc- The purpose of 



this part of the proof is to show that Assumption FR im- 
plies that Pxyz\abc must satisfy particular constraints, 
called non- signalling constraints, which characterize sit- 
uations where operations on different isolated systems 
cannot affect each other. Formally, these are 



Pyz\abc — Pyz\bc 

PxZ\ABC — PxZ\AC 
PxY\ABC — PxY\AB 



(3) 

(4) 
(5) 



We remark that the observation that the assumption of 
free choice gives rise to certain non-signalling constraints 
has been made already in |llj , and a similar argument has 
been presented by Gisin [2j and Blood [TU]. (Note that 
the arguments in HO] implicitly assume that measure- 
ments can be chosen freely). 

Assumption FR allows us to make A a free choice and 
hence we have 



Pa\bcyz — Pa 



(6) 



(the setup is such that the measurements specified by 
A and B are spacelike separated and, furthermore, S is 
static, so we can consider the case where its observation 
is also spacelike separated from the measurements spec- 
ified by A and B). Furthermore, using the definition of 
conditional probability (Pq\r := Pqr/Pr), we can write 



P, 



= P, 



YZA\BG — r YZ\BC x r A\BCY Z 



XP A 



= P A XP 



YZ\BC 



where we inserted ([6]) to obtain the second equality. Sim- 
ilarly, we have 

Pyza\bc — Pa\bc x Pyz\abc = Pa x Pyz\abc ■ 

Comparing these two expressions for Pyza\bc yields the 
desired non-signalling condition pj. By a similar argu- 
ment the other non-signalling conditions can be inferred 
from Assumption FR. 

Proof: Part II. — For the second part of the proof, we 
consider the distribution Pxy\ab resulting from certain 
appropriately chosen measurements on a maximally en- 
tangled state. We show that any enlargement of this 
distribution (via a system S that is accessed in a pro- 
cess with input SV C and output SV Z) to a distribu- 
tion Pxyz\abc which satisfies the above non-signalling 
conditions is necessarily trivial in the sense that 2 is un- 
corrected to the rest. For this we draw on ideas from 
non- signalling cryptography [20 , which are related to the 
idea of basing security on the violation of Bell inequali- 
ties [3T]. Technically, we employ a lemma (see Lemma 1 
in the Supplementary Information), whose proof is based 
on chained Bell inequalities 22, 23 and generalizes re- 
sults of [TU m . 

Consider any bipartite measurement with inputs A G 
{0,2, . . . , 27V - 2} and B G {1, 3, . . . ,2N - 1}, for some 
JVeN, and binary outcomes, X and Y. The correlations 



of the outcomes can be quantified by 

I N := P(X = Y\A = 0, B = 2TV - 1) 4 
Y, P(X^Y\A = a,B = b) 



(7) 



a,b 

-61 = 1 



Our lemma then asserts that, under the non-signalling 
conditions derived in Part I, 



D(P. 



Z\abcxi *Z\abc 



)<l 



N 



(8) 



for all a, b, c and x, where D is the variational distance, 
defined by D(P Z ,Q Z ) := §£JPz(z) - Qz(z)\. The 
variational distance has the following operational inter- 
pretation: if two distributions have variational distance 
at most <5, then the probability that we ever notice a 
difference between them is at most S. 

The argument up to here is formally independent of 
quantum theory. However, as we describe below (see the 
Experimental Verification section), for any fixed orthogo- 
nal rank-one measurement on a two-level subsystem, one 
can construct 2TV — 1 other measurements such that, ac- 
cording to quantum theory, applying these measurements 
to maximally entangled states leads to correlations which 
satisfy Ijy oc ^. It follows that, in the limit of large TV, an 
arbitrarily small bound on D(P Z \ abcT1 P Z \ abc ) can be ob- 
tained. We thus conclude that Pz\abcx = Pz\aba which, 
by the non-signalling condition Q, also implies (pi). We 
have therefore shown that the relation p} holds for the 
outcome X of any orthogonal rank-one measurement on 
a system that is maximally entangled with another one 
(our claim can be readily extended to systems of dimen- 
sion 2' for positive integer t by applying the result to t 
two-level systems). 

We also remark that Markov chains are reversible, i.e. 
Pz\abcx = Pz\abc implies P x \abcz = Px\abc, which to- 
gether with the non-signalling conditions gives Px\abcz — 
Px\ a - This establishes that, for any choices of B and C, 
learning Z does not allow an improvement on the quan- 
tum predictions, Px\ a - 

Proof: Part III. — To complete our claim, it remains 
to show that the Markov chain condition fl]) holds for 
measurements on arbitrary states (not only for those 
on one part of a maximally entangled state shared be- 
tween two sites). The proof of this proceeds in two steps. 
The first is to append an additional measurement with 
outcome X' , chosen such that the pair (X, X') is uni- 
formly distributed. In the second step, we split the mea- 
surement into two conceptually distinct parts, where, in 
the first, the measurement apparatus becomes entangled 
with the system to be measured (and, possibly the envi- 
ronment) and, in the second, this entangled state is mea- 
sured giving outcomes (X, X'). Since these outcomes are 
uniformly distributed, the state before the measurement 
can be considered maximally entangled, so that (fTl) holds 
with X replaced by (X, X'). This implies 11]) and hence 
completes the proof of Theorem f . 
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FIG. 2: Achievable values of In depending on the experi- 
mental visibility. This figure relates to the measurement setup 
used for testing the accuracy of Assumption QM as described in 
the Methods. The setup involves two parties and is parameter- 
ized by the number of possible measurement choices available to 
each party, TV. The plot gives the minimum In achievable de- 
pending on the visibility (red line), which determines the smallest 
upper bound on the variational distance from the perfect Markov 
chain condition fl]) that could be obtained with that visibility (see 
Eq. oSJ) ) . It also shows the optimal value of TV which achieves this 
(blue line). For comparison, the values achievable using TV = 2, 
which corresponds to the CHSH measurements [26J (yellow line), 
and the case TV = 8, which is optimal for visibility 0.98 (green 
line), are shown. 



Experimental Verification — As explained above, the 
validity of parts of Assumption QM can be established by 
a direct experiment. In particular, to verify the existence 
of the correlations required for Part II of the proof, i.e. 
those with small In, one should generate a large number 
(much larger than TV) of maximally entangled particles 
and distribute them between the measurement devices. 
At spacelike separation, a two-level subsystem (e.g. a spin 
degree of freedom) should then be measured, the mea- 
surement being picked at random from those specified 
below, and the results recorded. This is repeated for all 
of the particles. The measurement choices and results 
are then collected and used to estimate the terms in In 
using standard statistical techniques. 

For an arbitrary orthogonal basis {|0),|1)}, the re- 
quired measurements can be constructed in the follow- 
ing way. Recall that the choice of measurement on 
one side takes values A e {0, 2, ... , 2TV — 2} and simi- 
larly, B <G {1,3,..., 2TV — I}. We define a set of angles 
Qi = 2]yj and states 



{|*i>, !*_>} = {cos 



6-' 



10} 



e 3 

sin y |I) 



s ™-—\0)-cos—\l\ 



The required measurement operators are then E% 



9|)(0|| andF| = \9 b ±){9 b ±\ 



Although quantum theory predicts that arbitrarily 
small values of In can be obtained for large TV, due to 
imperfections and errors in the devices, it will not be pos- 
sible to experimentally achieve this. In |25j . a discussion 
of the achievable values of In with imperfect visibilities 



was given. For visibilities less than 1, it is not optimal to 
take N as large as possible to minimize the observed In. 
Thus, to get increasingly small bounds on the variational 
distance in (ISj) , one must increase the experimentally ob- 
tained visibilities as well as the number of measurement 
settings (see Figure [2]). 
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SUPPLEMENTARY METHODS 

FORMAL STATEMENT OF THE CLAIM 

In this section, we provide a formal description of our result and the assumptions it is based on. Their physical 
significance is explained in the main text. 

Definitions 

Definition 1. A spacetime random variable (SV), X, is a random variabletogether with a set of coordinates 
(i,ri,r 2) r 3 )eR 4 . 

The coordinates can be used to define an order relation between SVs, which one may interpret as a time ordering 
within relativistic spacetime. (Note, however, that on a formal level, we do not require any assumptions about 
relativity theory.) 

Definition 2. We say that a pair (A, X) of SVs is time-ordered, denoted A ~+ X, if the coordinate (t, r\, r 2 , r 3 ) of A 
lies in the backward lightcone of the coordinate {f ,r[, r' 2 ,r' 3 ) of X, i.e., (t — t') 2 > \\r — r'\\ 2 , t < t' . Furthermore, we 
say that two time-ordered pairs A -^ X and B ~-> Y are spacelike separated if A 7A Y and B 7A X. 

The next two definitions refer to quantum theory or, more precisely, quantum measurements. They will be used 
later for the formulation of Assumption QM. 

Definition 3. A quantum measurement, denoted (A ~~+ X, {_E"} 0;X , Hs), is a pair of time-ordered SVs, A ~-> X, 
called input and output, respectively, together with a family of measurement operators {£"} al on a Hilbert space, 
H s , such that J2 x ( E xY E x = Is for all a. 

We interpret the input A as the choice of an observable and X as the outcome of the measurement with respect 
to this observable. Quantum theory determines the distribution of X conditioned on A, depending on the quantum 
state ps of the system to which the measurement is applied. 

Definition 4. Given a density operator ps on Hs, the quantum measurement (A ~-> X, {E£} atX , Hs) is said to be 
compatible with ps if 

Px\A(x\a)=tr((E%?E°p s ), 

for all a and x. Likewise, a pair of quantum measurements (A ~-> X, {E^} a , x , Hs) and (B ~-> Y, {Fy}b,y, Ht) is said 
to be compatible with psr defined on Hs <8> Ht if 

PxY\AB(xy\ab) = tr[((££)t££ (F^F*) PST ] , 

for all a, b, x and y. 

We describe the process of choosing a value Aasa pair of SVs, Oa "■** A, where Oa is called the trigger event (Oa 
may be a constant). The process is considered free if the outcome A is not correlated to anything that existed before 
the trigger event Oa in any reference frame. 

Definition 5. Given a set of SVs T, a free choice (with respect to T) is a pair of time-ordered SVs, Oa ~^ A, such 
that A is statistically independent of the collection V := {W € T : Oa 7^ W}, i.e., Pat' = Pa x Pv- 

Quantum-Mechanical Description of the Measurement Process 

Before stating our assumptions, let us briefly recall the quantum-mechanical description of a measurement process. 
Most generally, a quantum measurement on a system S is described by a family {E x } x of operators acting on a Hilbert 
space Hs such that ^2 X E\E X — 1. If the state of S before the measurement is given by a density operator ps then 
each possible outcome X = x has probability 

P x (x)=tr(EtE x ps) . 



(Note that this is reflected by Definitions [3] and El) Furthermore, conditioned on this outcome, the state of S after 
the measurement is 

O) _ E x p s El 
° s Px(x) ■ 

Averaged over all outcomes, the state is therefore given by as = £(ps), where £ is the trace-preserving completely 
positive map (TPCPM) defined by 

£ ■■ ps -> <y s = J2 Pxi^P = E e *ps&x ■ 

X X 

The TPCPM £ can be seen as part of an extended TPCPM £ : pg t-> a$DR (in the sense that £ = trrjR o £ ) which 
specifies the joint state ctsdr of S, the measurement device, D, and possibly (parts of) the environment, R, after the 
measurement (one may think of £ as describing the joint evolution that the system S, measurement device D and the 
environment R undergo during a measurement). By choosing a sufficiently large environment, we can always take £ 
to be an isometry. Since the measurement outcome X is determined by the final state of the measurement device D, 
there exists a family of mutually orthogonal projectors {IT;}^ on the associated Hilbcrt space Hd, where each H x 
projects onto the subspace containing the support of the state of D corresponding to outcome X = x. Formally, this 
corresponds to the requirement that 

V.x : tT DR [£{p s )(ls ® II, (g) 1 R )] = E xPs El . (S.9) 

Assumptions 

To formulate our assumptions as well as our main claim, we consider an arbitrary quantum measurement 

(A ~» X, {E%} a>x , Us) (S.10) 

with constant input A = a and output X. Furthermore, we consider two SVs, C and Z, such that C -^ Z, which 
model the access to extra information provided by a potential extended theory. 

Our first assumption demands that the measurement we consider is correctly described by quantum mechanics. 



Assumption QMa. There exists a pure quantum state ps which is compatible with the quantum measurement (£ .10 ) 



For the next assumption, let £ : ps i-> o~sdr be an isometry from T-Ls to T-Ls®U.d®Ur and let {n^ja; be a family of 



projectors such that (gp)| holds for the operators {E x } x specified by the measurement (S .10 1. 1 The isometry £ models 



the joint evolution of the system, S, on which the measurement (£.10) is carried out, the measurement device, D, 
and the parts of the environment, it!, that may have been affected by the measurement. 2 We then consider arbitrary 
measurements {F x } a ^ x and {G b y }b, y on the subsystems D and SR, respectively, with the property that F x — H x . 
The following assumption demands that the statistics produced by these additional measurements are as predicted 



by quantum theory. Furthermore, the outcome X of the initial measurement (£.10) can be recovered by measuring 
(in an appropriate basis) the state of the device D used for this measurement. 

Assumption QMb. For appropriately defined SVs A' , X' , B, Y, the quantum measurements (A' ~+ X' , {F x } ax , Hd) 
and (B -^ Y, {G*}(, l3/ , Hs'3'Hr) ar ^ compatible with o~sdr, = £(ps)- Furthermore, the measurement on D is consistent 



with the initial measurement (S .10), in the sense that X' — X whenever A' — A — a. 



While the above assumptions are essentially consequences of the requirement that the existing quantum theory is 
correct, our last assumption demands that the measurement settings can be chosen freely. 



There are many ways to choose £ and {n^ja; with this property; 
our next assumption need only hold for one such choice. 
Note that, for (q^9ll to hold, it is sufficient that £ describes the 
interaction between S and D (and possibly R) on a microscopic 
scale and for a short time. Hence, the fact that £ is an isometry 
does not preclude subsequent "collapse" of the wave function. 



B 



X Z Y 

Supplementary Figure SI | Abstraction of the setup. Qi and Q2 depict a pair of quantum systems with inputs A and 
B and outputs X and Y respectively. S is a system which represents the additional information provided by the extended 
theory. Although these three systems (solid boxes) can be independently manipulated, they form parts of a larger system 
(dotted box). While no restriction is placed on the internal behaviour of the larger system, it follows from Part I of the 
proof that the combined distribution, Pxyz\abc< ,s non-signalling. 

Assumption FR. There exist SVs Oa, Ob and Oc with Oa ~^ X' , Ob "** Y and Oc "** Z spacelike separated 
such that Oa *-"■> A' , Ob ~*> B and Oc *** C are free choices with respect to {A' , B, C, X' ', Y, Z}, and all possible values 
of A' and B are taken with nonzero probability. 

Main Claim 



Theorem 1. If the quantum measurement (£.10), modelled by the pair A ~~> X, and the additional information, 
C ~~> Z, are such that Assumptions QMa, QMb and FR are satisfied then the Markov chain condition X -f-s- (A, C) <rr Z 
holds. 

PART II OF THE PROOF 

In this section, we prove the core inequality of Part II of our proof, Eqn. 8 in the Methods, which is stated as 
Lemma [T] below. 

Recall the bipartite scenario described in the main text. The measurements at each site are parameterized by values 
A G {0,2,..., 2N — 2} and B G {1,3,..., 2A^ — 1} for some N £ N, and their respective outcomes, X and Y, are 
taken to be binary. The measurements give rise to a joint probability distribution Pxy\ab from which we quantify 
the correlations relevant for our statement in terms of In defined by 

In{Pxy\ab) ■= P(X = Y\A = 0, B = 27V - 1) + ^ P(X ^Y\A = a,B = b) . 

a.b 

a-6| = l 

We consider enlargements of this probability distribution, Pxyz\abc ( see Figure SI), that satisfy the non-signalling 
property (cf. Part I of the proof), i.e., 

PxY\ABC = PxY\AB (S-ll) 

PxZ\ABC = PxZ\AC (S-12) 

Pyz\abc = Pyz\bc ■ (S.13) 

The claim is that any such extension approximately satisfies Pz\ a bcx = Pz\abc, i- e - ; Z is independent of X for any 
choices of a, b and c. The accuracy of the approximation is measured in terms of the variational distance. For two 
distributions, Px and Py over identical alphabets, this is defined by D(Px,Py) '■= \ Yli \Px{i) — Py{i)\- 

Lemma 1. For any non-signalling probability distribution, Pxyz\ABC> * n which the random variables X and Y are 
binary, we have 

D(P Z \abc X , P Z \abc) < In(P X Y\Ab) (S.14) 

for all a, b, c, and x. 

The proof is a generalization of an argument given in [15], which develops results of [20] and [24]. 
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Proof. We first consider the quantity I x evaluated for the conditional distribution Pxy\Ab,cz = Pxy\abcz{', • |-, ■, c, ^), 
for any fixed c and z. The idea is to use this quantity to bound the trace distance between the conditional distribution 
Px\acz an d its negation, 1 — Px\aczt which corresponds to the distribution of X if its values are interchanged. If this 
distance is small, it follows that the distribution Px\acz is roughly uniform. 
Let P x be the uniform distribution on X. For ao := 0, bo := 2N — 1, we have 

In(Pxy\ab,cz) = P(X = Y\A = a ,B = b ,C = c,Z = z)+ ]T P(X / Y\A = a, B = b, C = c, Z = z) 

a.b 

|a-b| = l 

> D(l — Px\a a b cz-, PY\a b cz) + £_^ D(P x \ abcz , PY\abcz) 

a,b 
|a-b| = l 

= D{1 — Px\a cz,PY\b cz) + /_^ D(P X \acz 7 -fV|E>cz) 



> D(l — Px\a cz,Px\a cz) 
= 2D(P X \a b cz,Px) ■ 



(S.15) 



The first inequality follows from the fact that D(P x ^,P Y \n) < P(X ^ ^|^) for any event f2 (see Lemma [2| be- 
low). Furthermore, we have used the non-signalling conditions P X \ a bcz = Px\acz (from (£.12)) and Pytabcz — PYlbcz 
(from (£.13)), and the triangle inequality for D. By symmetry, this relation holds for all a and b. We hence obtain 
D{P x \abcz,Px) < \In{Pxy\ab,cz) for all a, b, c and z. 

We now take the average over z on both sides of (£ |.15[ ). The left-hand-side gives 

^2 Pz\abc(z)I N {PxY\AB,cz) = ^2 P Z\c{z)In(PxY\AB,cz) 
z z 

= Y, P z\aob c(z)P(X = Y\a ,bo,c,z)+ J2 Y, P z\abc(z)P(X^Y\a,b,c,z) 

Z a, b z 

|a-b| = l 

= P(X = Y\a ,b ,c)+ J2 P{X^Y\a,b,c) 

a,b 

| q.-6|=1 

= Pn{Pxy\ab,c) j (S.16) 

where we used the non-signalling condition Pz\ a bc = Pz\c (which is implied by ( 8T2J ) and ( qT3j )) several times. Fur- 
thermore, taking the average on the right-hand-side of (£.15) yields J2 Z Pz\abc{z)D(P x \abcz, Px) = D(P X z\abc,Px x 
Pz\abc), so we have 



2D(P X z\a,bc,Px x Pz\abc) < In(PxY\AB.c) = In(PxY\Ab) 



(S.17) 



where the last equality follows from the non-signalling condition (£ |.11[ ). 

Inequality (£ fT7| and the relation D(P x ,Qx) < D(P X y,Qxy) imply D(P X \ a bc,Px) < \In(Pxy\ab), and hence 



\Px\abc(x) — ~\ < ~In(PxY\Ab) 



(S.18) 



for all a, b, c and x. Furthermore, since 

2D(P X Z\abc, Px x Pz\abc) = /]\PxZ\abc(0, %) ~ 7i P Z\abc[z)\ + /2 \ P XZ\abc(l, z) ~ ^ P Z\abc(z)\, 



and both terms on the right-hand-side are equal, using (£.17) we have 

"J2\PxZ\abc(x, Z) - -P Z \abc(z)\ < ^In(PxY\Ab) 
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for all a, b, c and x. Combining this with ( ^.18| gives 

D{Pz\abcx, Pz\abc) = 2_^ I yPz\abcx ( z ) ~ ^Pz\abc{ z )\ 

z 

< ^2\^Pz\abcx(z) ~ Px\abc(x)Pz\abcx(z)\ + /]\Px\abc{x)Pz\abcx{z) ~ l^Pz\abc{z)\ 

z z 

= y]Pz\abcx(z)\-F, ~ Px\abc(x)\ +^2\PxZ\abc(x,z) - ~jPz\abc{ z )\ 
z z 

< In{Pxy\ab) ■ 



This establishes the relation (£.14). □ 



Lemma 2. Let X and Y be random variables jointly distributed according to Pxy ■ The variational distance between 
the marginal distributions Px and Py is bounded by 

D(P X ,P Y )<P(X^Y) . 

Proof. Let P| y := Pxy\x^y be the joint distribution of X and Y conditioned on the event that they are not equal. 
Similarly, define Pxy '■— Pxy\x=y- We then have 

Pxy = p*P£ Y + (1 - P^)Pxy 
where p^ := P(X =/= Y). By linearity, the marginals of these distributions satisfy the same relation, i.e., 

Px = P*P X + (1 - P^)Px and P Y = p^Pf + (1 - p^)Pf . 
Hence, by the convexity of the variational distance, 

D(P x ,P Y )<p^D(P$,Pf) + (l-Pit)D(Pz,P?) < P^ , 
where the last inequality follows because the variational distance cannot be larger than one, and D(P^,Py) =0. □ 

PART III OF THE PROOF 

In this section we give the proof of the final part of Theorem l. 3 We use the setup and assumptions as formulated 
at the beginning of the Supplementary Methods. In Parts I and II of the proof (see the main text and the previous 
section) we showed that for all a, 6, c and x, the relation Pziacx — Pz\ac holds for projective quantum measurements 
compatible with one half of a maximally entangled state (cf. Lemma fl] and recall that for such measurements, the 
quantity In can be made arbitrarily small for sufficiently large N). Part III, explained here, extends this claim to 
arbitrary states (not necessarily maximally entangled ones) and arbitrary measurements. 

The argument proceeds in two steps. The first is to reduce the problem to a situation where the measurement 
outcome is essentially uniform. Let (A ~^> X, {E®} x , Us) be the quantum measurement under consideration (where 
the input A = a is fixed). The idea is that we can always append a second measurement, generating X, such that the 
distribution of the joint output (X, X) is flat (to any desired accuracy). 

Lemma 3. Let e > and let p$ be an arbitrary density operator on Us- P° r any measurement on S there exists an 
additional measurement such that the joint output distribution of (X,X), obtained by applying the two measurements 
sequentially to ps, has distance e to a flat distribution. 

Proof idea. It is easy to see that any probability distribution can be turned into an approximately flat one by adding 
an additional random process that "splits" each probability into sufficiently many smaller events. Furthermore, any 
such random process can be obtained by an appropriate choice of projective measurement (in a sufficiently large 
Hilbert space). □ 



3 The proof we give here is similar to an argument given by 
Zurek |27| to derive the Born rule starting from unitarity. 
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Let {E% s } x ,x be the set of measurement operators corresponding to the measurement (A ~~> (A, X), {E xS } x .x, Hs) 
which generates the pair (X, X), and let ps be a pure quantum state compatible with this measurement (see Assump- 
tion QMa). Next, we introduce projectors {H x ,x}x,x an d an isometry E such that osdr = £(ps) satisfies 

tr D fl((l s <8> n x , £ ® t R )a S DR.{^s <8> n»,s <g> 1 B )) = El sPs {El^ . 

(Note that the isometry can always be defined such that the projectors EL,. 2 have rank one.) According to Assump- 
tion QMb we can append additional quantum measurements (A' *■*► (X',X'), {F® s } axx , Hd) (with F° 2 = II X)S ) 
and (B ~> F, {G^j-b,,,, Hs <8> %r)j such that the output statistics are compatible with cr SDR . Furthermore, 
(X',X') = (A, A) whenever A' = a. Finally, by Assumption FR we can take A' and B to be free choices with 
Oa ~^> (X',X'), Ob ~^ Y, and Oc ~> Z spacelike separated (where Oa, Ob, and Oc are the trigger events for A', B, 
and C, respectively). 

Since the outcomes (A', A') of the measurement (for A' — a) are almost (up to an arbitrarily small distance e) 
uniformly distributed, and the state (Jsdr is pure, it must be (almost) maximally entangled between the measurement 
device, Wry and the remaining systems, Hg <8> Hr (by a suitable choice of the additional measurement, we can always 
take this to be maximally entangled over an integer number of two-level systems). Furthermore, {H x ,x}x,x are 
orthogonal projectors. Hence, by a suitable choice of the additional measurements producing (A', A') and Y, the 
argument given in Parts I and II of the proof implies that, for any e > and for all c, x and x, 

D(Pz\A'=a,cxx,Pz\A'=a,c) < £ ■ 

Since the values of (A, A) and (A', A') coincide for A' — A = a (cf. Assumption QMb), we have 

D(P Z \A=a,cx,Pz\A=a,c) < £ • 

This relation holds for all a, and, since e can be arbitrarily small, establishes the desired Markov chain condition 

Pz\A—a,cx = Pz\A=a,c- 

REMARKS ON THE NOTION OF LOCALITY 

Here we make some comments about the notion of locality. The main point is to highlight that Bell's notion of 
locality is similar to, but slightly less general than, the non-signalling nature of the extension (as derived in Part I of 
the proof). 

To quote Bell [2], locality is the requirement that "...the result of a measurement on one system [is] unaffected by 
operations on a distant system with which it has interacted in the past..." Indeed, our non-signalling conditions reflect 
this requirement and, in our language, the statement that Pxyz\abc is non-signalling is equivalent to a statement that 
the model is local (see also the discussion in [28 J. (We remind the reader that we do not assume the non-signalling 
conditions, but instead derive them from the free choice assumption.) 

In spite of the above quote, Bell's formal definition of locality is slightly more restrictive than these non-signalling 
conditions. Bell considers extending the theory using hidden variables, here denoted by the variable Z. He requires 
Pxy\abz = Px\az x P y \bz (see e.g. [13]), which corresponds to assuming not only P x \abz = Px\az and P Y \abz = 
Pyibz (the non-signalling constraints, also called parameter-independence in this context), but also Pxiabyz = 
Px\abz an d Py\abxz = Py\abz (also called outcome-independence). These additional constraints do not follow 
from our assumptions and are not used in this work. 

A possible reason for the discrepancy is that Bell principally considered extended theories which are deterministic 
given the hidden variables. In this case, the distinction between Bell's notion of locality and the non-signalling 
conditions we use is unimportant: if A is deterministic given A and Z, then Pxiabyz = Px\az follows automatically. 
In fact, the converse also holds: given parameter-independence and outcome-independence a necessary condition for 
the model to recreate the quantum correlations arising from measurements on a maximally entangled state is that it is 
deterministic given the hidden variables. To see this, note that for any measurement A = a, there is a corresponding 
measurement B = b a such that quantum theory predicts identical outcomes. In other words, Px\ab a yz = $x,y The 
assumptions of parameter-independence and outcome-independence give Px\az — Px\ab a yz, and so Px\az( x ) = 8 x ,y 
This implies that A and Y are determined given A and Z. 

CANDIDATE EXTENSIONS BASED ON SIMULATIONS OF QUANTUM CORRELATIONS 

It has been shown in a number of ways that quantum correlations can be simulated from other resources. For 
example, all correlations generated by projective measurements on a maximally entangled pair of qubits can be 
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simulated by shared randomness and one bit of classical communication [2S], or by shared randomness and a non- 
local box [30] (a hypothetical device with stronger-than-quantum correlations [3TJ|32])- Furthermore, these results 
have been generalized to arbitrary (not necessarily maximally entangled) pure states [3"3"] . 

Since such simulations recreate quantum correlations, they may appear at first sight to be extensions of quantum 
theory We will not provide an exhaustive treatment of all such models, but instead give a short explanation as to 
why the examples above do not contradict our claim. 

First note that the ability to simulate quantum correlations does not imply the ability to predict the outcomes of a 
genuine quantum experiment. However, when thinking about these simulations in the context of extending quantum 
theory, the hypothesis is that the components of the simulation really exist and are used to generate outcomes. 

The case where communication is needed is analogous to de Broglie-Bohm theory [16, 17] (discussed in the main 
text). In order that the simulation can work in the case of spacelike separated measurements, the communication 
bit, Z (which depends on one of the measurement choices, say A), must propagate faster than light. The bit Z is 
therefore accessible outside the future lightcone of A. According to Assumption FR, it must be possible to choose A 
to be independent of this (now pre-existing) information, which would no longer be the case. Such models therefore 
contradict Assumption FR. 

In the model of |30| . where a non-local box is used for the simulation, even with full access to this box, there is no 
better way to predict the measurement outcomes. To see this, note that the output, X, of a measurement specified 
by a parameter, A, is generated in the simulation by xORing a shared classical value with the output of a non-local 
box, whose input depends on A. Since the individual outputs of a non-local box are uniform and random the same 
is true for X. Hence, while the simulation recreates the correct quantum correlations, it does not extend quantum 
theory in the sense of providing any extra information about future measurement outcomes. It is hence in agreement 
with Part II of the proof 

However, because it recreates the quantum correlations, the simulation provides more information about the out- 
comes of joint measurements. To see that this is incompatible with quantum theory, one would need to apply Part III 
of our argument, using a description of how the model evolves under reversible operations. Such a description is not 
given in the above model and, furthermore, in consistent theories which permit non-local boxes }34j the reversible 
dynamics are known to be trivial [35] . They cannot therefore result in a state whose statistics are consistent with 
those from a quantum evolution, and hence contradict Assumption QMb. 
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